Search CORE

40 research outputs found

Symbiosis between the TRECVid benchmark and video libraries at the Netherlands Institute for Sound and Vision

Author: AF Smeaton
AF Smeaton
Alan F. Smeaton
B Huurnink
B Huurnink
CGM Snoek
CGM Snoek
CV Thornley
D. Tjondronegoro
H.-T. Pu
Johan Oomen
L. Hollink
M Hertzum
Paul Over
S Shatford
Wessel Kraaij
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Audiovisual archives are investing in large-scale digitisation efforts of their analogue holdings and, in parallel, ingesting an ever-increasing amount of born- digital files in their digital storage facilities. Digitisation opens up new access paradigms and boosted re-use of audiovisual content. Query-log analyses show the shortcomings of manual annotation, therefore archives are complementing these annotations by developing novel search engines that automatically extract information from both audio and the visual tracks. Over the past few years, the TRECVid benchmark has developed a novel relationship with the Netherlands Institute of Sound and Vision (NISV) which goes beyond the NISV just providing data and use cases to TRECVid. Prototype and demonstrator systems developed as part of TRECVid are set to become a key driver in improving the quality of search engines at the NISV and will ultimately help other audiovisual archives to offer more efficient and more fine-grained access to their collections. This paper reports the experiences of NISV in leveraging the activities of the TRECVid benchmark

Crossref

Irish Universities

DCU Online Research Access Service

Radboud Repository

Sound and Vision Publications

The Wikipedia Image Retrieval Task

Author: CGM Snoek
JC Gemert van
M Grubinger
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

htmlabstractThe wikipedia image retrieval task at ImageCLEF provides a testbed for the system-oriented evaluation of visual information retrieval from a collection of Wikipedia images. The aim is to investigate the effectiveness of retrieval approaches that exploit textual and visual evidence in the context of a large and heterogeneous collection of images that are searched for by users with diverse information needs. This chapter presents an overview of the available test collections, summarises the retrieval approaches employed by the groups that participated in the task during the 2008 and 2009 ImageCLEF campaigns, provides an analysis of the main evaluation results, identifies best practices for effective retrieval, and discusses open issues

Crossref

CWI's Institutional Repository

Evaluating Multimedia Features and Fusion for Example-Based Event Detection

Author: A Hauptmann
CGM Snoek
CGM Snoek
D Xu
DG Lowe
H Jégou
I Laptev
J-M Geusebroek
JC Gemert
KEA Sande van de
L Ballan
M Merler
M Merler
P Felzenszwalb
T Tuytelaars
Y-G Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Multimedia event detection (MED) is a challenging problem because of the heterogeneous content and variable quality found in large collections of Internet videos. To study the value of multimedia features and fusion for representing and learning events from a set of example video clips, we created SESAME, a system for video SEarch with Speed and Accuracy for Multimedia Events. SESAME includes multiple bag-of-words event classifiers based on single data types: low-level visual, motion, and audio features; high-level semantic visual concepts; and automatic speech recognition. Event detection performance was evaluated for each event classifier. The performance of low-level visual and motion features was improved by the use of difference coding. The accuracy of the visual concepts was nearly as strong as that of the low-level visual features. Experiments with a number of fusion methods for combining the event detection scores from these classifiers revealed that simple fusion methods, such as arithmetic mean, perform as well as or better than other, more complex fusion methods. SESAME’s performance in the 2012 TRECVID MED evaluation was one of the best reported

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Bayesian Prompt Learning for Image-Language Model Generalization

Author: Bulat A
Derakhshani MM
International Conference on Computer Vision
Martinez B
Sanchez E
Snoek CGM
Turrisi da Costa VG
Tzimiropoulos G
Publication venue: International Conference on Computer Vision
Publication date: 02/10/2023
Field of study

Foundational image-language models have generated considerable interest due to their efficient adaptation to downstream tasks by prompt learning. Prompt learning treats part of the language model input as trainable while freezing the rest, and optimizes an Empirical Risk Minimization objective. However, Empirical Risk Minimization is known to suffer from distributional shifts which hurt generalizability to prompts unseen during training. By leveraging the regularization ability of Bayesian methods, we frame prompt learning from the Bayesian perspective and formulate it as a variational inference problem. Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts. Our framework is implemented by modeling the input prompt space in a probabilistic manner, as an a priori distribution which makes our proposal compatible with prompt learning approaches that are unconditional or conditional on the image. We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space, prevents learning spurious features, and exploits transferable invariant features. This results in better generalization of unseen prompts, even across different datasets and domains. Code available at: https://github.com/saic-fi/Bayesian-Prompt-Learnin

Queen Mary Research Online

Multimodal extraction of events and of information about the recording activity in user generated videos

Author: CGM Snoek
Francesco Cricri
Igor D. D. Curcio
Kostadin Dabov
Moncef Gabbouj
Sujeet Mate
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Using manual and automated annotations to search images by semantic similarity

Author: CGM Snoek
CP Town
H Turtle
HD Wactlar
João Magalhães
JZ Wang
M Flickner
MJ Swain
N Rasiwasia
N Vasconcelos
Stefan Rüger
T Volkmer
Y Rui
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

Open Research Online (The Open University)

Everyday concept detection in visual lifelogs: validation, relationships and trends

Author: A Bovik
A Hauptmann
A Smeaton
Aiden R. Doherty
Alan F. Smeaton
AR Doherty
AR Doherty
Cees G. M. Snoek
CGM Snoek
D Byrne
D Wang
Daragh Byrne
G Bell
Gareth J. F. Jones
H Naphade
HT Lin
J Fleiss
J Kapur
JC Gemert van
JM Geusebroek
JR Landis
MA Hoang
N O’Hare
R DeVaul
VN Vapnik
YG Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user's day-to-day activities. It can capture up to 3,000 images per day, equating to almost 1 million images per year. It is used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer's life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the novel domain of visual lifelogs. A concept detector models the correspondence between low-level visual features and high-level semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept's presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were then evaluated on a subset of 95,907 images, to determine the precision for detection of each semantic concept. We conduct further analysis on the temporal consistency, co-occurance and trends within the detected concepts to more extensively investigate the robustness of the detectors within this novel domain. We additionally present future applications of concept detection within the domain of lifelogging

CiteSeerX

Crossref

Oxford University Research Archive

DCU Online Research Access Service

International Migration, Integration and Social Cohesion online publications

Finding Semantically Related Videos in Closed Collections

Author: A Argyriou
CGM Snoek
Christos Tzelepis
Christos Tzelepis
DG Lowe
F Markatopoulou
F Markatopoulou
G Csurka
Herbert Bay
Jia Deng
KEA Sande Van de
LE Sucar
M Baumgartner
MF Weng
Nikiforos Pittaras
O Russakovsky
P Dollár
P. Sidiropoulos
V Ferrari
X Wang
X Zhao
Y Wei
Y Yang
Zhanpeng Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Modern newsroom tools offer advanced functionality for automatic and semi-automatic content collection from the web and social media sources to accompany news stories. However, the content collected in this way often tends to be unstructured and may include irrelevant items. An important step in the verification process is to organize this content, both with respect to what it shows, and with respect to its origin. This chapter presents our efforts in this direction, which resulted in two components. One aims to detect semantic concepts in video shots, to help annotation and organization of content collections. We implement a system based on deep learning, featuring a number of advances and adaptations of existing algorithms to increase performance for the task. The other component aims to detect logos in videos in order to identify their provenance. We present our progress from a keypoint-based detection system to a system based on deep learning

Crossref

Queen Mary Research Online